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This  report  traces  the  flow  of  data  from  the 
time  of  capture  by  the  minicomputer-based  NMM  through 
the  several  phases  of  modeling  and  structuring  in  the 
DAP.  Included  in  this  description  is  the  statistical 
treatment  of  the  data  which  provides  quantitative 
measures  of  various  aspects  of  network  performance. 

Key  words:  Computer  networks;  data  analysis; 
interactive;  network  service;  performance 
evaluation;   performance  measurement;   service. 


1  .   Introduction 


This  technical  note  is  one  of  a  set  containing  detailed 
technical  information  concerning  the  Network  Measurement 
System  (NMS).  The  NMS  represents  an  implementation  of  a  new 
approach  to  the  performance  measurement  and  evaluation  of 
computer  network  systems  and  services. 


This  report  discusses  the  process  of  developing  the  raw 
data  acquired  by  the  data  acquisition  subsystem,  known  as 
the  Network  Measurement  Machine  (NMM),  into  meaningful 
information  through  the  Data  Analysis  Package  (DAP).  The 
DAP  produces  reports  about  the  quality  of  network  service 
delivered  to  interactive  terminal  users  as  well  as  a 
characterization  of  user  demands  and  network  communication 
facility  utilization. 

The  concept  of  service  evaluation  is  discussed  in 
Abrams  and  Cotton  [1975].  The  ways  in  which  the  NMM 
operates  in  order  to  obtain  the  raw  measurement  data  may  be 
found  in  Rosenthal,  Rippy  and  Wood  [1975].  The  present 
technical  note,  along  with  the  two  just  cited  constitute  a 
complete  technical  description  of  the  Network  Measurement 
System.  In  addition,  the  use  of  network  measurement  as  a 
part  of  network  evaluation  is  discussed  in  a  Federal 
Information  Processing  Standards  Guideline  now  in 
preparation  under  the  principal  authorship  of  Abrams, 
Watkins,  and  Cotton  [1975]. 

1.1  Definitions,  Intent,  and  Applicability 

A  computer  network  for  the  purpose  of  this  report  is 
defined  as  one  or  more  computers  connected  to  one  or  more 
interactive  terminals  via  a  communications  facility.  (Pyke 
and  Blanc  [1973]  and  Blanc  [1974]  present  a  discussion  of 
computer  network  technology).  When  discussing  service 
delivered  by  a  network,  the  computers  and  communications 
facility  are  often  considered  as  one  by  the  network  user. 

Computer  installation  managers,  procurement 
specialists,  performance  evaluation  specialists,  and 
computer  users  can  benefit  from  utilization  of  the  NMS.  Of 
course,  their  interests  in  the  details  of  its  operation  will 
vary  widely.  This  report  is  intended  to  convey  conceptual 
information  about  the  functioning  of  the  data  analysis  and 
report  generation  parts  of  that  system.  It  does  not  contain 
implementation  detail  at  the  program  level  suitable  for 
replication  of  the  effort.  It  also  provides  information 
about  the  functional  structure  of  the  NMS  to  others  working 
in  the  network  measurement  field.  In  most  cases  it  may  be 
assumed  that  a  small  group  of  individuals  in  an  organization 
would  have  major  responsibility  for  operation  of  the  NMS  and 
the  presentation  of  its  outputs  to  other  interested  parties. 
The  term  "analyst"  is  applied  to  those  most  closely 
associated  with  the  NMS  operation.  The  level  of  detail 
contained  in  this  report  is  important  to-  the  analyst  who 
must  interpret  the  NMS  reports  and  present  them  to  others. 


This  report  traces  the  flow  of  data  from  the  time  of 
capture  by  the  minicomputer-based  NMM  through  several  phases 
of  modeling  and  structuring  in  the  DAP.  Included  in  this 
description  is  the  statistical  treatment  of  the  data  which 
provides  quantitative  measures  of  various  aspects  of  network 
performance . 

The  NHS  provides  a  quantitative  basis  for  Federal 
agencies  and  other  network  users  and  operators  to  select  and 
improve  computer  networks  and  network  services.  After  the 
procurement  of  a  network  service,  the  NMS  provides  a  means 
to  assure  that  service  continues  to  be  provided  at  an 
acceptable  level.  For  example,  a  network  serving  a  certain 
number  of  customers  may  provide  the  best  service  during  the 
selection  portion  of  a  procurement  action  and  therefore  be 
awarded  the  contract;  however,  several  months  later  the 
number  of  customers  served  by  the  network  may  double  or 
triple,  causing  the  network  to  perform  at  an  unacceptable 
level.  Periodic  application  of  the  NMM  to  network  services 
would  assure  that  service  continues  to  be  provided  at  a 
contractually  agreed  upon  level. 

1.2  The  Service  Approach 

The  approach  to  measurement  represented  by  the  NMS  is 
service-oriented.  By  focusing  on  the  service  delivered  to 
the  network  customers  at  their  terminals,  rather  than  on  the 
internal  mechanics  of  network  operation,  measurements  can  be 
obtained  which  are  directly  relevant  to  user  needs  and 
management  concerns.  As  discussed  by  Abrams  and  Cotton 
[1975],  a  network  user  at  an  interactive  terminal  is 
concerned  with  service  rather  than  total  system  efficiency. 
It  is  of  little  concern  to  the  user  that  the  network  is 
serving  other  users.  This  only  becomes  a  concern  when  the 
user  is  forced  to  wait  longer  than  usual  for  the  network  to 
answer  a  request.  At  that  point  the  user  is  aware  that 
service  has  degraded  for  some  reason.  The  user  does  not 
care  that  the  Central  Processing  Unit  (CPU),  or  input/output 
devices  are  over-burdened  or  why  they  are;  the  user  only 
knows  that  the  service  being  delivered  by  the  network  is 
unsatisfactory. 

1.3  System  Overview 

The  Network  Measurement  Machine  (NMM),  the  data 
acquisition  component  of  the  NMS,  is  a  minicomputer-based 
system  which  captures  all  characters  transmitted  between  a 
user  at  a  terminal  and  the  computer  network  serving  the 
user.  The  NMM  associates  a  time-tag  (the  time  a  character 
occurs)   and   the   source  of  the  character  (user  or  network) 


with  each  character  transmitted. 

Data  is  not  structured  or  analyzed  during  acquisition. 
Time-tagged  characters  and  other  pertinent  information  are 
written  on  magnetic  tape  as  rapidly,  as  possible  as  a 
consequence  of  the  design  criterion  to  avoid  any  activity  in 
the  NMM  which  might  possibly  compromise  the  accuracy  of  the 
timing. 

Once  recorded,  the  data  are  processed  by  the  DAP. 
Briefly,  the  processing  proceeds  as  follows:  The  multiple 
conversations  on  the  tape  are  separated  into  individual 
conversations  by  demultiplexing  the  recording.  Each 
individual  conversation  is  then  processed  to  remove 
character  echos,  and  scanned  to  build  a  structure  file  which 
contains  pointers  to  the  user  and  network  messages.  The 
network  software  resources  utilized  by  each  message  are 
identified  and  noted  in  the  structure  file.  Individual 
conversations  may  be  analyzed,  reports  generated,  a  file 
written  for  additional  data  processing  by  independent 
statistical  packages,  and  another  file  created  for  use  in 
the  analysis  of  multiple  conversations. 

1.4  The  NBS  Implementation 

The  NMM  is  implemented  on  a  Digital  Equipment 
Corporation  PDP  11/20  an-d  has  the  capacity  to  acquire  data 
on  eight  simultaneous,  independent,  character  asynchronous 
user/network  interactions.  The  standard  hardware  includes  a 
high  precision  programmable  clock  and  a  set  of 
communications  interfaces.  Special  hardware  connects  the 
NMM  to  the  network  to  be  measured.  This  special  hardware 
includes  an  automatic  calling  unit  and  line  selector  system 
for  computer-controlled  originating  and  answering  of  data 
calls  via  common  carrier  communications  facilities. 

The  NMM  has  a  special  operating  system  which  is  a 
real-time  interrupt  driven  scheduler  incorporating  various 
device  drivers  and  handlers.  The  user  and  network 
characters  and  related  information  acquired  during 
application  of  the  NMM  are  stored  on  a  seven  track  magnetic 
tape.  Detailed  information  concerning  NMM  hardware  and 
software  is  presented  in  Rosenthal,  Rippy  and  Wood  [1975]. 

The  Data  Analysis  Package  (DAP)  processes  the  data 
acquired  by  the  NMS.  The  DAP  is  implemented  on  a  Digital 
Equipment  Corporation  DECSystem  10.  The  magnetic  tapes 
generated  by  the  NMM  are  transferred  to  the  analysis  machine 
for  subsequent  processing. 


2.   Transformation  of  the  NMM  Data 

Each  NMM-generated  magnetic  tape  may  contain  data 
acquired  on  one  or  several  days  of  NMM  operation;  or  one 
day  of  operation  may  produce  a  multi-reel  file.  In  any 
case,  the  DAP  creates  an  independent  disc  file  for  each  day 
represented  on  the  tapes.  The  naming  convention  for  files 
on  the  DAP  machine  is: 

FILE. EXT 

where  FILE  is  the  filehame  and  may  consist  of  six  or  less 
alphanumeric  characters,  and  EXT  is  the  filename  extender 
and  may  consist  of  three  or  less  alphanumeric  characters. 
Following  this  convention,  a  disc  file  which  represents  one 
day's  accumulation  of  data  is  referenced  as  a  .DMT  file<« 
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2.1  Description  of  the  NMM  Data 

The  magnetic  tape  produced  by  the  NMM  is  organized  into 
information  records  (1).  There  are  five  record  types  which 
define  the  1)  configuration  of  the  user/NMM  connection,  2) 
time  and  date  when  the  data  was  acquired,  3)  software 
version  number  of  the  NMM,  4)  destination,  network  to  which 
user  was  connected,  and  5)  the  user/network  characters  with 
associated  time-tags. 


(1)  A  record  is   a   collection   of   related   items   of   data 
treated  as  a  unit.   See  FIPS  11  [1970]. 


The  configuration  record  contains  information  related 
to  the  hardware  configuration  of  the  user's  connection  to 
the  NMM  at  the  time  of  the  recording.  Such  details  as  the 
type  of  interface,  the  type  of  connection,  and  the 
transmission  rate  of  the  interface  are  contained  in  this 
record . 

The  NMM  uses  a  24  bit  clock  which  is  incremented  10,000 
times  per  second.  Therefore,  approximately  every  20  minutes 
a  clock  overflow  occurs.  Each  time  there  is  a  clock 
overflow  a  time/date  record  which  contains  the  current  time 
of  day  and  Julian  date  is  generated. 

The  version  record  insures  that  the  analysis   routines 

treat   the   data   appropriately   for   the  version  of  the  NMM 

software   which   was   used   in   acquiring  the   data.    Thus 

compatibility  between  the  NMM  and  the  DAP  is  preserved. 

The  destination  record  specifies  the  network  to  which 
the  user  is  connected.  The  information  contained  in  this 
record  is  used  in  the  separation  of  conversations  process 
and  will  be  discussed  in  the  next  section. 

When  the  data  is  acquired  by  the  NMM,  every  character 
transmitted  by  the  user  or  the  network  is  tagged  with  the 
current  time  in  clock  ticks.  A  record  containing  a 
character  and  time-tag  is  called  a  character  record. 

Every  record  contains  a  unit  number  indicating  the 
source  of  the  information  contained  in  the  record.  A 
conversation  or  dialogue  has  associated  with  it  two  unit 
numbers,  one  for  the  user  portion  and  one  for  the  network 
portion.  All  even  unit  numbers  refer  to  the  user  portions 
of  conversations,  while  odd  refer  to  the  network  portions. 
Once  a  conversation  is  initiated,  the  unit  numbers 
associated  with  that  conversation  are  constant.  When  the 
conversation  is  terminated,  those  units  become  available  for 
another  user/network  dialogue. 

2.2  Demultiplexing  Conversations 

Records. are  ordered  on  the  magnetic  tape  in  the  same 
sequence  as  they  arrive  at  the  NMM.  This  implies  that 
multiple  conversations  are  interleaved  on  a  tape. 

The  .DMT  files  are  scanned  by  a  routine,  RAW,  which 
creates  a  new  set  of  disc  files  referenced  as  .RAW  files. 
There  is  one  .RAW  file  created  for  each  dialogue  represented 
on  the  .DMT  file.  RAW  demultiplexes  the  interleaved 
dialogues  by  recognizing  pairs  of  unit  numbers  and   creating 


an  individual  file  for  each  pair.  ,  Global  information  such 
as  configuration  details,  version,  and  destination  network 
is  placed  in  a  header  to  the  .RAW  files  reserved  for  such 
information.  When  the  24-bit  clock  employed  by  the  NMM 
overflows,  the  DAP  extends  the  time-tag  to  27  bits. 
Overflow  of  a  27  bit  time-tag  does  not  occur  for 
approximately  two  and  three-fourths  hours.  When  the  27  bit 
clock  overflows,  timing  adjustments  are  made  and  the  clock 
is  reset.  The  integrity  of  the  time-tags  is  not  compromised 
in  this  process. 

Each  character  with  associated  time-tag  and  source 
(user  or  network)  is  placed  in  one  word  of  the  .RAW  file  (7 
bits  reserved  for  the  character,  27  for  the  time-tag  and  1 
for  the  source).  The  time  and  date  information  requires  two 
consecutive  words  in  .RAW.  Character  and  time/date  records 
in  the  .RAW  files  are  distinguished  by  a  flag  bit:  if  the 
flag  equals  zero  then  the  record  is  a  one  word  character 
record,  if  the  flag  equals  one  then  the  next  two  words 
contain  a  time/date  record.  Figure  1  illustrates  the  three 
types  of  information  found  in  a  .RAW  file. 

The  naming  convention  for  the  .RAW  files  is  as  follows: 

DDDNNN.RAW 


where 


DDD     is  the  ordinal  number  of  the  day, 
NNN     is  the  conversation  position  on  .DMT 


All  conversations  occurring  within  a  given  decimal  year  are 
maintained  in  a  directory  which  incorporates  that  year  in 
its  name.  For  example,  the  third  conversation  recorded  on 
June  14,  1975,  is  stored  in  the  file  165003. RAW  in  the 
directory  for  1975. 
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Figure  1.  Elements  of  a  .RAW  File 


Figure  2  illustrates  the  reconfiguration  of  a 
hypothetical  .DMT  file  into  its  component  .RAW  files.  The 
first  word  of  each  file  diagram  shows  the  format  of  the  data 
contained  in  the  file. 

2.3  Recognition  of  Conversation  Boundaries 
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The  destination  record  contains  an  integer  code 
designating  the  network  connection  made.  Each  connection 
has  a  distinct  log-in  and  log-out  procedure.  Therefore,  all 
that  is  necessary  is  to  search  for  the  string  of  characters 
that  represents  the  termination  of  a  conversation  for  the 
network  which  indicates ^that  the  .RAW  file  is  complete.  To 
include  the  case  of  a  network  crash  or  unplanned  disconnect , 
the  string  of  characters  representing  a  log-in  request  is 
included  in  the  search,  the  occurrence  of  which  indicates 
that  the  preceding  .RAW  file  should  be  terminated  and  a  new 
.RAW  file  begun.  This  search  procedure  is  accomplished  by  a 
sliding  character  string  match  against  the  data  being 
processed.  This  match  is  implemented  in  software  as  a 
finite  state  machine. 
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Currently  the  possible  states  are: 

<  -3  testing  for  beginning  of  conversation 

-3  accept  any  character  as  a  beginning 

-2  terminate  current  .RAW  file;  begin  a  new 

.  RAW 

-1  terminate  .RAW  file 

0  between  conversations 

1-999  conversation  in  progress 

To  illustrate,  Figure  3  is  the  procedure  for  one  of  the 
current  network  connections  specified  in  a  destination 
record,  the  TIP  (2).  TIP  commands  always  start  with  the 
symbol  @  and  end  with  either  a  carriage  return  (optionally 
linefeed)  or  a  rubout ,  depending  on  whether  the  user  wishes 
the   command   to   be  issued  or  be  aborted.   The  exception  is 


(2)  The  Terminal  Interface  message  Processor  is  the 
communications  interface  to  the  Advanced  Research 
Projects  Agency  Network  (ARPANET).   See  BBN  [1974]. 
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the  specific  command  @@  which  causes  the  TIP  to  insert  one  § 
in  the  data  stream  to  the  host  computer.  Normally,  TIP 
connection  to  a  host  computer  is  requested  by  an  "@L",  an 
"@0",  or  an  "§,H,"  while  termination  of  a  connection  is 
initiated  by  an  "@C."  The  state  is  initialized  at  0  and 
remains  there  until  a  beginning  of  conversation  is  signaled; 
upon  receipt  of  an  "@"  the  state  moves  to  -3.  If  the  next 
character  is  a  "C,"  the  state  becomes  -1.  After  termination 
of  the  current  raw  file,  the  state  becomes  0.   If  the  "§"  is 


followed   by   an 


an 


'01 


or   an  "H,"  the  next  state 


becomes  -2.  After  termination  of  the  current  raw  file  and 
opening  of  a  new  one,  the  state  becomes  1  to  indicate  the 
processing  of  a  conversation.  Once  at  state  1,  an  "@"  is 
the  only  character  that  can  signal  an  exit.  Upon  receipt  of 
an  "@"  the  next  state  is  2.  If  another  "■§"  is  received,  the 
state  goes  back  to  1;   otherwise,  the  state  becomes  -3. 


Figure  3.   End -of -Conversation  Search  for  TIP 
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3.   Models  and  Their  Parameters 

A  model  was  formulated  to  represent  the  dialogue  in 
such  a  manner  as  to  readily  extract  meaningful  measurement 
information.  As'  data  were  acquired  and  interpreted  by  the 
model,  it  was  necessary  to  refine  the  model  to  adequately 
account  for  new  types  of  data  and  provide  for  their 
meaningful  interpretation  relative  to  the  measurement 
objectives. 

3.1  Definition  of  the  SAR  Model 


The  first  model  employed  by  the  DAP  was  a  simple 
iterated  stimulus-response  situation  (Scherr  [1966])  of  the 
form 


^-  START 


I 


User  Transmits  Stimulus 


I 


Network  Transmits  Response 


Of  course  the  text  transmitted  varies  for  each  iteration 
the  model. 


of 


This  model  presumes  that  the  user  is  the  "driver",  that 
is,  the  stimulus  is  the  catalyst  which  elicits  a  response 
from  the  network.  In  reality  just  the  opposite  may  be  true 
or  the  "driver"  may  alternate  during  a  dialogue.  However, 
the  stimulus-response  form  appears  adequately  consistent 
with  the  "service"  orientation  of  the  NMS. 
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message  group. 

While  the  presence  of  a  stimulus  and  response  in  a 
message  group  may  be  assumed,  the  network  output  must  be 
tested  to  ascertain  the  presence  of  an  acknowledgement.  It 
should  be  noted  that  differentiation  between  acknowledgement 
and  response  is  semantic  and  time  dependent,  that  some 
networks  issue  no  acknowledgement,  that  some  networks  are 
inconsistent  in  their  acknowledgement,  and  that  in  some 
cases  the  acknowledgement  constitutes  the  only  response. 

To  determine  if  an  acknowledgement  is  present,  three 
algorithms  are  used.  At  the  beginning  of  an  analysis 
session,  the  analyst  using  the  DAP  has  the  option  to  specify 
which  of  these  acknowledgement  definitions  is  to  be  used  or 
to  specify  a  new  definition.  The  first  algorithm  requires 
the  specification  of  a  set  of  acknowledgement  characters; 
this  character  set  (the  match  set)  is  used  for  string 
matching.  An  acknowledgement  is  delineated  by  comparing  the 
match  set  with  the  beginning  of  the  character  string  from 
the  network.  The  comparison  begins  with  the  first  network 
character  and  continues  until  the  last  character  of  the 
match  set  or  until  a  "no  match"  is  encountered.  If  the  end 
'of  the  match  set  is  successfully  reached,  an  acknowledgement 
is  present;  otherwise,  the  entire  network  output  comprises 
the  response. 

A  second  algorithm  is  based  on  timing  information.  If 
a  delay  in  the  network  output  is  encountered  greater  than  a 
fixed  multiple  of  the  character  duration,  then. the  output  is 
divided  into  an  acknowledgement  and  a  response.  A  character 
duration  is  the  time  interval  needed  for  one  character  to  be 
transmitted  and  is  calculated  by  dividing  the  number  of  bits 
required  for  each  character  (including  start  and  stop  bits) 
by  the  speed  of  the  interface  in  bits-per-second .  The 
interface  speed  information  is  available  in  the 
configuration  record  described  earlier.  The  default 
parameter  is  set  at  50  character  durations;  however,  the 
analyst  may  redefine  this  parameter. 

The  third  algorithm  defines  the  existence  of  an 
acknowledgement  based  on  network  output  beginning  with 
nonprinting  ASCII  control  characters  rather  than  printing 
characters.  All  nonprinting  characters  occurring  at  the 
beginning  of  network  output  until  the  occurrence  of  a 
printing  character  are  considered  within  the 
acknowledgement . 
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These  three  algorithms  may  be  applied 
in  combination. 
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Figure  4.  Time  Intervals  During  and  Between  Transmissions 


Since  transmission  and  delay  times  are  measured  by  the 
elapsed  time  between  two  events,  there  are  numerous 
measurements  which  may  be  made  by  selecting  different 
events.  Examining  the  literature  on  measurement  techniques 
(Agajanian  [1975]),  it  is  apparent  that  experimenters  do  not 
agree  on  the  significance  of  events,  especially  related  to 
network  output.  Because  of  the  multiplicity  of  definitions 
about  to  be  introduced,  the  names  of  events  will  explicitly 
state  the  interval  being  measured. 
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delay.  Finally,  the  time  between  the  end  of  the 
acknowledgement  and  the  last  character  of  the  response  is 
the  "acknowledgement  to  end  of  response"  delay.  In  dealing 
with  networks  that  do  not  issue  acknowledgements,  the 
network  delay  time  definitions  would  be  the  same  as  above 
except  for  the  substitution  of  stimulus  for  every  occurrence 
of  acknowledgement.  The  most  commonly  used  measure  of 
delay,  conventionally  called  "response  time,"  is  the  elapsed 
time  from  the  end  of  the  stimulus  to  the  beginning  of  the 
network's  output.  (The  acknowledgement/response  distinction 
is  not  recognized.)  This  is  termed  the  "stimulus  to 
response"  delay. 

3.3  Communications  Conventions 

In  fitting  the  conversational  data  to  the  appropriate 
model,  the  communications  conventions  had  to  be  considered 
in  order  that  the  data  be  interpreted  correctly.  The 
simplest  computer  communications  convention  to  implement  and 
understand  is  half-duplex.  In  half-duplex  there  may  be  only 
one  source  of  transmission  at  a  time,  the  network  or  the 
user.  As  a  consequence,  the  printer  mechanism  on  the  user's 
terminal  is  directly  connected  to  the  keyboard. 

An  alternate  convention  is  full-duplex  in  which 
transmission  can  -  occur  in  both  directions  simultaneously. 
In  sophisticated  implementations  of  full-duplex  the  user  is 
allowed  to  type  ahead;  that  is,  the  user  does  not  have  to 
wait  for  the  network  to  respond  before  entering  the  next 
stimulus.  A  major  consequence  of  full-duplex  transmission 
is  that  there  is  no  connection  between  printer  and  keyboard 
on  the  user's  terminal.  If  characters  transmitted  by  the 
user  are  printed,  it  is  by  virtue  of  traveling  into  the 
network  and  being  retransmitted  back  to  the  user  terminal 
printer.  This  process  of  retransmission  is  called  echoing. 
Retransmission  may  be  performed  by  a  communications 
processor  in  the  network  or  by  a  remote  computer  connected 
to  the  network.  While  the  communications  conventions  used 
may  have  implications  for  network  efficency,  the  concern  of 
the  service  approach  to  measurement  is  network  performance 
as  viewed  by  a  user  at  a  terminal,  not  the  internal 
functioning  of  components  of  the  network. 

There  are  several  elements  which  complicate  analysis  of 
full-duplex  usage.  Transformation  of  a  nonprinting 
character  into  a  sequence  of  printing  characters  is  common. 
For  example,  end  of  text  or  ETX  (which  on  many  keyboards  is 
transmitted  by  depressing  the  CONTROL  key  and  the  letter  C 
simultaneously)  may  cause  the  transmission  back  to  the  user 
terminal  of  the  two  characters  ""C"  in  sequence.   Type  ahead 
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also  introduces  complexities.  Since  the  user  is  not 
constrained  to  waiting  for  a  network  response  before 
entering  another  stimulus,  stimuli  may  be  queued  by  the 
network  until  it  processes  them.  At  that  time  a  series  of 
responses  may  be  printed  on  the  user's  terminal. 
Interleaving  of  responses  is  another  complication  that 
type-ahead  may  introduce.  In-this  case  a  response  does  not 
necessarily  immediately  follow  the  stimulus  to  which  it  is 
associated.  Figure  5  illustrates  these  type-ahead  related 
complications  .  in  contrast  to  the  normal  user-network 
sequence  expected. 

Full-duplex  transmission  has  two  major  effects  on  the 
analysis.  First,  it  increases  network  utilization  by  virtue 
of  echoes.  Second,  it  makes  more  difficult  the  demarcation 
between  user  and  network  transmission  segments. 

In  many,  but  not  all,  cases  the  representation  of  a 
full-duplex  conversation  may  be  reduced  to  an  equivalent 
half-duplex  case.  This  transformation  makes  it  possible  to 
employ  the  same  data  reduction  techniques  to  both 
transmission  types.  The  implementation  of  this 
transformation  records  sufficient  data  to  reflect  the 
increased  network  utilization  of  full-duplex  operation. 

Semantic  content  analysis  is  required  to  reduce  the 
full-duplex  record  to  an  equivalent  half-duplex  case 
whenever  character  transformation  and  type-ahead  occur.  The 
current  DAP  does  not  provide  this  capability.  Since 
half-duplex  mode  is  the  dominant  convention  for  the 
interactive  use  of  networks  and  is  the  simplest  to 
understand  and  analyze,  a  half-duplex  model  containing  the 
meaningful  points  of  this  mode  has  been  created.  Whenever 
possible,  data  obtained  from  a  full-duplex  conversation  are 
transformed  according  to  the  half-duplex  model. 

The  program  which  performs  the  transformation  from  the 
.RAW  file  to  a  half-duplex  model  is  HDMOD  which  produces  a 
corresponding  .HDX  file.  The  naming  convention  of  the  .HDX 
files  is  identical  to  that  for  the  .RAW  files.  Figure  6 
defines  the  elements  of  the  .HDX  files. 
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(a)   Normal  Sequence 


User: 

5 

+ 

7  <CR> 

Network: 

12 

User: 

1 

+ 

2<CR> 

Network: 

3 

User: 

9 

+ 

9  <CR> 

Network: 

18 

(b)   Queued  Sequence 


User: 

5  + 

7  <CR> 

User: 

1  + 

2<CR> 

User: 

9  + 

9<CR> 

Network: 

12 

Network: 

3 

Network: 

18 

(c)   Interleaved  Sequence 


User: 

5 

+ 

7  <CR> 

User: 

1 

+ 

2<CR> 

Network: 

12 

User: 

9 

+ 

9  <CR> 

Network: 

3 

Network: 

18 

where   <CR>   indicates  carriage  return 


Figure  5.  Full-Duplex  Possibilities 
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Figure  6.   Elements  of  an  .HDX  File 
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3.4  Echo  Removal 


the 
As 

may 


approximation 
a  full-duplex 
characters  is 
illustrated  in 
be   interposed 


The  problem  is  to 


The  echo  .removal  algorithm  produces  an 
for  the  half-duplex  representation  of 
conversation.  While  the  sequence  of 
preserved  in  an  echo,  the  timing  is  not, 
Figure  7,  any  number  of  user  characters 
between  a  user  character  and  its  echo 
identify  when  a  character  transmitted  from  the  network  is  an 
echo  and  when  it  is  the  beginning  of  a  network  output 
sequence.  The  algorithm  utilized  requires  that  the 
beginning  of  the  user  stimulus  be  identified.  This  was 
accomplished  by  defining  any  character  from  the  user  as 
terminating  a  network  transmission.  Likewise,  any  character 
from  the  network  which  is  not  an  echo  terminates  the  user 
transmission.  The  procedure  involves  placing  user 
characters  into  a  buffer,  maintaining  pointers  to  the  end  of 
the  buffer  and  to  the  current  user  character.  Each  network 
character  received  is  an  echo  candidate  and  has  to  be 
compared  with  the  current  user  character.  As  long  as  a 
match  exists,  the  pointer  is  advanced  and  the  process 
repeated.  When  the  end  of  the  buffer  is  reached,  the  user 
transmission  is  terminated.  If  there  is  a  nonmatch  before 
the  end  of  the  buffer,  the  remaining  user  characters  end  the 
network  transmission  and  begin  the  next  user  transmission, 
the  exact  location  being  independent  of  the  time  at  which 
the  various  characters  occurred.  By  this  definition, 
endings  and  beginnings  are  strictly  determined  by  time 
sequence . 


character 


echo  or 

network  character 


a  ,  ,  [ii  a  a  , , ,  , 


6J    [Hj       llj    |£j-lK|    W    [H]    W 


TIME 


Figure  7.   Intermixing  of  User  Input  and  Network  Output 
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One  exception  is  made  for  a  nonexact  match.  There  is  a 
DAP  option  to  treat  an  upper  case  character  output  from  the 
network  as  a  match  to  the  corresponding  lower  case  character 
input  from  the  user.  The  presence  of  this  option  was 
indicated  by  the  observation  that  this  mode  of  echoing 
exists.  The  historical  precedence  of  upper  case  only 
terminals  may  explain  its  existence. 

3.5  Conversation  Record 

The  hard  copy  representation  of  the  conversation  is 
that  of  the  half-duplex  SAR  model.  The  format  of  the 
printed  record  is  given  in  Figure  8.  Each  message  group  in 
the  conversation  is  subdivided  into  its  three  components, 
and  the  characters  belonging  to  each  one  of  these  components 
appear  to  the  right  of  the  corresponding  label. 

Control  characters  are  optionally  represente-d  by  their 
standard  abbreviation  enclosed  in  corner  brackets.  For 
example,  a  carriage  return  would  appear  as  <CR>.  For  the 
sake  of  readability  the  space  character  is  treated  as  a 
special  case.  If  a  space  is  the  first  or  last  character  on 
a  line,  it  appears  as  <SP> ;  otherwise,  a  blank  character  is 
printed.  Multiple  occurrences  of  control  characters  are 
indicated  by  printing  an  asterisk  followed  by  the  count  of 
repetitions  ,  followed  by  a  closing  corner  bracker.  For 
example,  seven  linefeeds  would  appear  as  <LF>*7>. 

The  .HDX  file,  from  which  the  SAR  model  is  derived, 
contains  time/date  records  in  addition  to  the 
character/time-tag  information.  The  occurrence  of  a 
time/date  record  is  preserved  in  the  conversation  record  by 
the  printing  of  the  message: 

Recording  Time:   XXXX 

where  XXXX  represents  the  hour.  The  execution  of  the 
analysis  routines  does  not  necessitate  a  hard  copy 
reproduction  of  the  conversation,  but  if  a  reproduction  is 
requested,  the  user  specifies  the  terminal  being  used.  The 
program  maintains  pertinent  statistics  (i.e.,  the  number  of 
rows  and  columns  per  page  and  per  inch)  for  a  variety  of 
terminals . 

The  user  of  the  DAP  has  the  ability  to  designate  that 
any  or  all  of  the  previously  described  model  parameters  be 
calculated.  Further,  the  analyst  has  the  option  to  have 
these  parameters  printed.  The  format  is  that  each  message 
group  is  printed  followed  by  its  corresponding  parameters. 
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CONVERSATION  RECORD  OF  FILE  184003. HDX 

####  LEGEND  #### 

S=Stimulus      R=Response       A=AcknowledRement 
####  #### 

S    1  @ 

A         1    <CRXNUL>*5XLF>*6> 

R         1    OMNUS    AUTOBAUD.    <SP>    *2>RATE    =    3    (300)    ON    PORT    33.<CRXMUL> 
*5><LF>*5>PLEASE    ENTER    SITE-ID .  <CRXNUL>*5XLF>*6> 

RECORDING    TIME:  JULY      3.     1975  44160SEC 

S         2    XXXXXX 

A         2    <CRXLFXDELXCRXDEL>*6> 

R         2    ENTER    USERID/PASSWORD :  <DELXCRXLFXDEL>*6>XDEL> 

S    3  *xxxxxx/yyyyy<CR> 

A         3    <CRXLFXDEL>*7> 

3         3    XDELXCRXLF>*9XDEL>**>*DESTROY    USERID/PASSWORD    ENTRY 

<DELXCRXLFXDELXCRXDEL>*6>*UNIVAC    1100    OPERATING    SYSTEM    <SP> 
*2>V£R.    31-244    UPD    D(RSI)*    <SP>    *2XDEL> »2XCR>*2XLF>*2XDEL>*2> 

S        4  @§tty    l,_,c,",w,80<CR> 

A         4    <CRXLFXDELXCRXDEL>*6>. 

R         4    *    @§    PROCESSING    COMPLETE    *    <SP>    *2XDEL>*2XCR>*2XLF>*2XDEL>*2> 

S        5   (?run    xxxxxx  ,  12345-xxxxxx  ,  yyyyyy  ,  8  ,  800<CR> 
A         5    <CRXLFXDELXCRXDEL>*6> 

R         5    DATE:    070375    <SP>    *6>TIME:     121858    <SP>    *2XDEL>*2XCR>*2XLF> 
*2XDEL>*8>XDEL> 

S    6  (?add  setup. <CR> 

A         6    <CRXLFXDELXCRXDEL>*6> 

R         6    READY    <SP>    *3><DEL>*3><CR>*3><LF>*3><DEL>*3><CR>*3><DEL>*8>R 

EADY    <SP>    *3><DEL>*3XCR>*3XLF>*3><DEL>*8>READY    <SP>    *3><DEL> 
*3XCR>*3XLF>*3XDEL>*8>READY    <SP>    *3><DEL>*3><CR>*3><LF> 
*3><DEL>*8>READY    <SP>    *3><DEL>*3><CR>*3><LF>*3><DEL>*8>R 
EADY    <SP>    *3><DEL>*3><CR>*3><LF>*3><DEL>*8>READY    <SP>    *3><DEL> 
*3><CR>*3><LF>*3><DEL>*8>READY    <SP>    *3><DEL>*3><CR>*3><LF> 
*3><DEL>*3><CR>*3XDEL>*8>FURPUR    NS26H-O7/03-12 :  19<DELXCR> 
<LFXDELXCRXDEL>*6>CSDACCTTEST*TPF$(0)  ,F4  .  T<DELXCRXLF> 
<DEL>*6>CSDACCTTEST*SETUP( 1) ,F2,A    <SP>    *2>SYM$$$00 1 0 1 6 , ( 
ADD)    <SP>    *3><DEL>*3XCR>*3XLF>*3><DEL>*8>DINES*UOM(0)  ,DUMM 
Y    <SP>    *2>Q,    <SP>    *2XDEL>*2XCR>*2XLF>*2XDEL>*7>CSDAC 
CTTEST*GEORGE(  1)  ,F2,A    <SP>    *2>G,    <SP>    *2XDEL>*2XCR>*2XLF> 
-*2XDEL>*7>CSDACCTTEST*BASE(  1)  ,F2,A    <SP>    *2>B  .  <DELXCR> 
<LFXDEL>*6>CSDACCTTE3T*UPDATE(  1)  ,F2,A    <SP>    *2>U,    <SP> 
*2XDEL>*2XCR>*2XLF>*2XDEL>*7>CSDACCTTEST*GG(  1  )  ,F2,A    <SP> 
*2XDEL>*2XCR>*2XLF>*2XDEL>*7>    <SP>    *2>END    PRT    <SP> 
*3XDEL>*3XCR>*3XLF>*3><DEL>*9>XDEL> 

Figure    8.      Sample    Conversation    Record 
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4.   Data  Structuring 

Once  the  message  groups  and  their  corresponding 
components  are  identified,  this  identification  is  retained 
in  order  that  the  process  need  not  be  repeated  in  future 
analysis  sessions.  An  index  (.INX)  file  which  contains 
pointers  (addresses)  into  the  half-duplex  (.HDX)  data  file 
was  implemented  so  that  specific  message  groups  may  be 
accessed  without  reprocessing  the  entire  data  base.  The 
program  which  produces  the  .INX  files  is  SARTST  and  the 
naming  convention  for  the  .INX  files  is  identical  to  that 
for  the  .RAW  and  .HDX  files.  Figure  9  represents  the  flow 
of  data  from  the  original  .DMT  file  to  the  .INX  file. 

4. 1  Type  of  Structure 

The  index  file  has  been  organized  to  best  represent  the 
form  of  the  conversational  data  and  to  take  advantage  of  the 
random  file  addressing  capability  of  the  computer  employed 
for  data  analysis.  This  organization  facilitates  the 
efficient  retrieval  of  data.  The  structure  employed  is  a 
series  of  linked  lists  some  of  which  are  doubly  linked.  The 
base  of  the  structure  contains,  explicitly  or  via  pointers, 
descriptive  information  concerning  the  conversation.  This 
descriptive  information  includes  the  date  of  recording,  the 
network  and  user  participating,  the  length  of  the 
half-duplex  file,  etc.  Since  the  index  file  is  implemented 
on  a  random  access  medium,  this  descriptive  information  may 
be  modified  or  expanded  at  any  time.  A  utility  program 
provides  this  capability. 

As  shown  in  Figure  10  the  base  also  contains  a  pointer 
to  the  first  message  group.  These  message  group  headers  are 
representated  in  a  doubly  linked  list.  Each  message  group 
header  is  numbered  and  contains  the  address  in  the 
half-duplex  file  of  the  beginning  of  the  stimulus, 
acknowledgement,  and  response  for  that  message  group.  If  a 
time/date  record  occurs,  a  pointer  to  that  record  is  placed 
in  the  message  group  header. 

Access  to  structured  data  is  provided  through  a  set  of 
interface  subroutines  which  serve  to  insert  and  retrieve 
data  by  message  group.   Subroutines   have   been   written   to 
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Figure  10.  Directed  Graph  of  Full -Duplex  Data  Index  File  Structure 
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4.2  The  Structured  File 

Each  word  in  the  half-duplex  file  is  examined  by  SARTST 
during  the  creation  of  the  index  file.  Information  records 
which  are  identified  by  bit  zero  being  set  at  one  (1)  are 
handled  separately  as  discussed  below. 

4.2.1  Treatment  of  Character  Records 

Only  the  character  and  the  bit  indicating  source  are  of 
jlnterest  for.  purposes  of  identifying  the  stimulus, 
acknowledgement,  and  response.  As  a  matter  of  fact,  the 
character  is  only  of  interest  in  identifying  the 
acknowledgement.  The  stimulus  beginning  is  identified  by 
the  change  of  source  from  network  to  user.  This  beginning 
address  and  the  number  of  stimulus  characters  are  sufficient 
input  to  the  creation  of  the  index  file. 

The  change  of  source  from  user  to  network  identifies 
the  end  of  the  stimulus  and  the  beginning  of  either  the 
acknowledgement  or  response.  The  algorithm  to  determine  the 
number  of  characters  in  the  acknowledgement  is  coded  in  a 
separate  subroutine  which  was  described  in  section  3.1.  The 
acknowledgement  may  consist  of  any  non-negative  number  of 
characters,  including  zero.  The  algorithm  defining  the 
acknowledgement  is  compiled  as  a  table  for  each  network 
identified  in  the  half-duplex  file  headers.  The  analyst  has 
the  option  of  changing  the  acknowledgement  definition. 

4.2.2  Treatment  of  Information  Records 

Information  records  are  signaled  by  bit  zero  being  set 
to  one  (1).  Counting  from  left  to  right,  bits  one  through 
seven  contain  a  one's  complement  negative  identification 
number.  Implicit  in  the  identification  is  the  number  of 
words  which  contain  the  information.  Presently,  the  only 
information  record  defined  in  the  data  structure  file  is  the 
time/date  record. 

A  time/date  record  is  not  associated  with  a  character; 
rather  it  is  associated  with  an  entire  message  group.  More 
than  one  time/date  record  within  a  message  group  is  not 
atypical,  especially  when  excessive  stimulus  delay  time 
(i.e.,  think  time)  indicates  user  distraction.  Provision  is 
therefore  made  for  associating  multiple  time/date  records  to 
a  message  group  by  chaining. 
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Reading  of  the  .INX  file  proceeds  sequentially  from 
beginning  of  the  stimulus,  acknowledgement,  or  response  and 
continues  for  the  number  of  characters  indicated  by  the 
count  in  the  message  group  header.  Since  sequential  reading 
may  encounter  information  records,  the  subroutines  which  do 
the  actual  file  reads  skip  over  information  records 
automatically. 

5.   Statistical  Treatment  of  the  Data 

Thus  far  a  variety  of  techniques  for  the  defining  and 
structuring  of  data  has  been  discussed.  Now  the  procedures 
for  analyzing  the  described  data  will  be  explained  (3).  The 
program  responsible  for  the  statistical  treatment  of 
individual  conversations  is  SARANS. 

5.1  Measured  Data  and  Derived  Data 

The  characters  with  associated  time-tags  obtained  by 
the  NMM  constitute  the  measured  data.  The  DAP  introduces 
derived  data  which  are  arithmetic  combinations  of  the 
measured  data.  These  derived  data  are  basically  the  counts 
of  characters  occurring  in  the  stimulus,  acknowledgement  and 
response,  or  the  time  delays  between  the  boundaries  of  one 
or  more  of  these  message  group  components.  More  complicated 
arithmetic  is  employed  for  some  of  the  derived  data  in  the 
form  of  ratios  and  perhaps,  ratios  of  sums.  The  percentage 
of  printing  and  nonprinting  characters,  the  percentage  of 
network  characters,  and  the  transmission  rates  are  all 
examples  of  such  derived  data.  The  statistics  must  be 
accumulated  and  processed  individually  for  each  of  the 
derived  data  parameters.  It  is  unlikely  that  all  of  the 
derived  data  parameters  will  be  of  interest  to  all  users  of 
the  DAP.  Therefore,  the  DAP  user  is  presented  with  a  choice 
of  the  parameters  to  have  computed.  While  it  may  be  more 
convenient  to  give  this  choice  at  the  time  of  execution, 
implementation  constraints  require  a  compile  time  selection. 

5.2  Standard  Sampling  Interval 

The  intent  of  the  DAP  is  the  measurement  of  activity 
typical  of  a  user/network  dialogue,  not  the  measurement  of 
anomalies;  therefore,  it  is  reasonable  to  eliminate 
outliers    for   the   calculation   of   the   statistics.    For 


(3)  The  reader  unfamiliar  with  any  statistical  terms  used  is 
referred  to  Mathematics  Definitions  (pp.  504-532),  and 
Statistics  Definitions  (pp.   533-560)  of  Sippl  [1967]. 
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example,  it  is  possible  for  an  on-line  network  user  to 
become  distracted  by  and  involved  in  an  activity  totally 
unrelated  to  network  usage.  It  is  also  possible  for  a 
network  to  crash  at  any  point  during  a  conversation.  Such 
events  produce  distorting  data.  While  the  number  of  times  a 
network  unexpectedly  disconnects  (crashes)  over  a  given 
period  may  be  interesting,  one  such  crash  could  give  an 
unrealistic  slant  to  a  given  set  of  statistics  by  producing 
an  abnormally  long  response  time.  To  recognize  the  presence 
of  these  data,  upper  and  lower  limits  must  be  used.  These 
limits  determine  the  standard  sampling  interval.  Data  must 
fall  within  the  interval  to  be  considered  in  the  statistics. 
The  acknowledgement  and  response  delay  time  limits  should  be 
set  high  enough  to  assure  that  the  network  has  crashed  and 
is  not  just  heavily  loaded.  The  acknowledgement  and 
response  transmission  time  limits  are  based  on  the  average 
character  speed. 

The  upper  limits  of  stimulus  delay  and  transmit  times 
are  set  with  the  purpose  of  eliminating  the  data  reflecting 
a  user  become  involved  in  activities  unrelated  to  network 
usage . 

5.3  Frequency  Count  Distribution 

By  dividing  the  standard  sampling  interval  into  a 
number  of  subintervals  it  is  possible  to  characterize  the 
distribution  of  the  derived  parameters  by  counting  the 
number  of  occurrences  of  a  parameter  value  in  each  of  the 
subintervals.  These  equal  class  intervals  may  be  directly 
utilized  in  statistical  analysis,  as  discussed  in  section 
5.5  below.  The  finer  the  division,  the  greater  the  number 
of  subintervals,  and  the  more  accurate  the  representation  of 
the  distribution.  This  subdivision  of  data  is  necessary  for 
the  production  of  histograms.  Practical  programming 
considerations  limit  the  number  of  subintervals;  it  is, 
therefore,  desirable  to  make  the  standard  sampling  interval 
as  small  as  possible.  By  this  criterion  the  upper  and  lower 
limits  on  the  standard  sampling  interval  should  be  carefully 
chosen  to  exclude  uninteresting,  as  well  as  anomalous 
values.  Since  there  is  a  minor  conflict  between  these  two 
preceding  criteria  for  establishing  the'  standard  sampling 
interval,  this  selection  should  be  done  very  carefully  and 
the  results  of  the  analysis  studied  to  detect  a  possible 
bias  introduced  by  the  interval  choice. 
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Figure  1 1  is  a  sample  of  the  histograms  currently 
produced.  The  vertical  axis  is  divided  into  intervals, 
called  frequency  classes,  which  reflect  the  values  of  the 
data;  this  may  be  time  in  seconds  or  number  of  characters. 
The  horizontal  axis  represents  the  number  of  occurrences. 
The  exact  number  of  occurrences  for  a  class  appears  in 
parentheses  next  to  the  base  class  value,  which  represents 
the  smallest  value  a  data  item  may  have  to  be  represented  in 
the  class.  The  total  number  of  occurrences  in  all  classes 
appears  at  the  bottom  of  the  histogram. 

5.4  Analysis  Results 

In  addition  to  the  histograms,  statistical  measures  of 
the  data  including  the  mean,  standard  deviation,  median 
(50th  percentile),  and  the  90th  and  95th  percentiles  are 
provided . 

The  mean  and  standard  deviation  of  the'  derived 
parameters  may  be  calculated  from  the  frequency  count 
distribution.  As  a  check,  the  mean  and  standard  deviation 
are  also  calculated  on  a  cumulative  basis  as  each 
observation  is  recorded.  A  comparison  of  these  two  methods 
for  calculating  the  moments  produces  an  independent 
confirmation  of  the  intervals  used  in  the  frequency  count 
distribution . 

In  the  analysis  of  data  the  appropriate  statistical 
measures  depend  on  the  underlying  distribution.  The 
statistical  quantities  shown  in  Figure  11  should  therefore 
be  regarded  as  typical  examples  to  be  replaced  by  other 
quantities  as  desired.  The  mean  and  standard  deviation  are 
included  in  the  statistical  quantities  currently  calculated; 
however,  due  to  the  non-normal  distribution  of  the  data,  the 
median,  90th  and  95th  percentiles  may  be  more  informative. 
The  user  must  also  be  mindful  of  the  fact  that  for  some 
performance  parameters  each  of  these  runs  and  its  summary 
constitute  a  single  "event."  Variations  in  repetition  of  the 
same  basic  event  should  therefore  also  be  investigated  to 
gain  a  better  understanding  of  the  uncertainties  in  the 
network. 
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FREQUENCY  CLASS  BASE  VALUE    (NUMBER  OF  OCCURRENCES) 
GROUP  FREQUENCY  RANGE       1.2 

TOTAL  NUMBER  OF  OCCURRENCES:   147 
MEAN=       10.0    STANDARD  DEVIATION:       12.0 
MEDIAN=      3.5    90%=      21.6    95%=      37.9 

Figure  1 1 .   Sample  Histogram 
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Following  the  printing  of  the  histograms  with 
accompanying  statistics  for  the  designated  model  parameters 
is  a  conversation  summary.  The  summary  begins  with  a  review 
of  the  statistics  associated  with  each  parameter.  The  speed 
of  the  connection  (recorded  in  the  configuration  record), 
the  number  of  occurrences  of  anomalous  data  (values 
occurring  outside  the  standard  sampling  interval),  and  the 
total  time  of  the  conversation  are  printed.  In  addition,  a 
number  of  measures  concerning  utilization  of  the  connection 
line  are  listed.  All  characters  are  generated  by  either  the 
user  or  the  network;  further,  they  are  either  printing  or 
nonprinting.  A  variety  of  percentages  relative  to  these 
character  groupings  are  calculated.  These  percentiles  serve 
as  a  guideline  concerning  usage  of  a  connection.  For 
example,  in  order  to  compensate  for  varying  speeds  in 
carriage  control  for  different  transmission  rates,  a  network 
may  always  send  a  given  number  of  nonprinting  characters  at 
the  beginning  of  each  new  line  based  on  the  fastest 
available  speed.  This  technique  avoids  line  dependent 
calculation  of  delays  but  may  cause  the  user  to  pay  for 
transmittal  of  unnecessary  characters. 

Communications  or  line  utilization  reflects  the  actual 
use  of  a  transmission  line  relative  to  the  potential  use  of 
that  line.  In  other  words,  how  many  characters  were  sent  on 
the  line  during  a  given  transmission  period  by  the  user  or 
network  relative  to  the  number  that  the  line  could  have 
transmitted  during  that  time  period.  Two  measures  of  line 
utilization  are  given.  One  defines  the  potential  time 
interval  as  beginning  with  the  first  character  of  a  message 
sent  by  the  source  of  transmission  and  ending  with  the  last 
character  of  that  message.  The  other  measure  incorporates 
in  its  calculation  of  the  potential  time  interval  the  delay 
time  imposed  by  the  source.  These  statistics  help  to 
indicate  if  the  user  has  chosen  an  unrealistic  connection 
speed.  See  Figure  12  for  a  sample  of  that  part  of  the 
summary  describing  utilization  of  the  connection  line. 

A  .SUM  file  which  contains  the  frequency  distribution 
array  obtained  from  the  analysis  is  created  by  SARANS.  This 
file  conforms  to  the  same  naming  convention  as  .RAW,  .HDX, 
and  .INX.  The  analysis  of  multiple  conversations  is 
performed  by  the  program  MULT  which  creates  a  composite 
frequency  distribution  by  totaling  the  contents  of  the  .SUM 
files  selected. 
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Figure  12.   Sample  Line  Utilization  Statistics 
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5.5  Statistical  Analysis  Support 

In  the  implementation  of  the  statistical  analysis 
portions  of  the  DAP,  every  effort  was  made  to  avoid 
duplication  of  existing  available  programs.  A  subroutine 
from  the  machine-independent  Scientific  Subroutine  Package 
(SSP  [1968])  was  used  to  calculate  moments  for  grouped  data 
on  equal  class  intervals.  As  discussed  in  section  5.4,  a 
confirmation  of  the  results  is  provided  by  calculation  of 
the  mean  cumulatively  from  the  ungrouped  observations. 

.  Since  reliable,  sophisticated  statistical   analysis   is 

available   in  OMNITAB  (Ku  [19731;  Hogben,  Peavy,  and  Varner 

[1971]),  the  DAP  produces  files  of  the   derived   data   which 
may  be  input  to  OMNITAB. 

5.6  Verification  of  the  Statistical  Results 

It  is  possible  to  test  the  accuracy  of  the  algorithms 
employed  in  the  DAP  by  calculating  the  statistics  by  hand, 
and  then  comparing  the  hand  calculations  with  those  from  the 
DAP.  However,  this  proves  tedious  for  a  sample  of  any 
appreciable  size,  and  there  is  the  probability  of  human 
error  . 

It  was  decided  that  the  best  way  to  test  the 
statistical  package  was  to  execute  it  using  a  data  base  of 
known  statistical  distributions  of  the  parameters  of  the  SAR 
model.  A  program,  BLDRAW,  was  developed  to  create  such 
data.  The  output  of  BLDRAW  is  a  file  in  the  format  of  the 
.RAW  files  described  previously;  because  the  program 
produces  only  half-duplex  conversations,  the  file  is  also 
identical  to  the  .HDX  files.  The  analyst  specifies  the 
distributions  for  nine  of  the  parameters  calculated  by  the 
analysis  routines:  character  count,  transmit  time  and  delay 
time  for  the  stimulus,  acknowledgement  and  response.  Values 
for  these  parameters  may  be  specified  as  a  constant  or  in 
terms  of  a  mean  and  standard  deviation.  The  program  also 
asks  the  analyst  to  supply  the  following  information: 

i)    name  of  the  file  to  contain  generated  data, 
ii)   number  of  message  groups  in  generated 

conversation,  and 
iii)  speed  of  the  connection  line  for  generated 

conversation . 
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If  the  analyst  chooses  a  transmit  time  too  small  with 
respect  to  the  corresponding  character  count  and  the 
capacity  of  the  chosen  line,  the  program  warns  the  analyst 
and  requests  a  new  transmit  time. 

Available  in  SSP  is  a  Gaussian  number  generator,  GAUSS, 
which  produces  a  series  of  random  numbers  which  conform  to  a 
specified  distribution.  BLDRAW  makes  use  of  this  routine  to 
calculate  appropriate  time-tags  and  number  of  characters  in 
each  message  group  component  to  produce  the  analyst-defined 
data  file. 

The  data  files  produced  by  BLDRAW  did  indicate  the 
existence  of  several  minor  errors  in  the  statistical 
routines  which  were  corrected.  It  also  enabled  the  creation 
of  potential  problem  situations  never  encountered  in 
previous  recordings. 

A  set  of  statistics  generated  using  a  file  built  by 
BLDRAW  is  used  as  a  comparison  basis  for  future  acceptance 
testing  of  the  analysis  routines.  When  changes  are  made  to 
the  analysis  package,  statistics  will  be  produced  using  the 
test  data  file  in  order  to  assure  that  the  routines  are 
still  functioning  properly.  Before  processing  a  major 
portion  of  data  files,  it  is  advisable  to  run  the  test  data 
file  through  the  analysis  routines  to  insure  that  the  DAP 
computer  is  behaving  normally.  Thereby  BLDRAW's  test  data 
file  provides  a  basis  for  acceptance  criteria  for  the 
analysis  routines. 

6.   Subsets 

The  working  unit  in  the  interaction  between  the  user 
and  the  network  is  the  message  group  which  may  be  considered 
a  generalization  of  a  transaction  in  special  purpose 
transaction-oriented  networks  such  as  reservation  networks. 
Employing  set  terminology,  the  conversation  is  the  ordered 
set  of  all  message  groups.  Many  other  sets  of  message 
groups  can  be  defined.  The  set  concept  may  also  be  extended 
beyond  the  boundary  of  a  single  conversation,  where  the 
upper  limit  (i.e.,  the  universe  of  discourse)  is  the  entire 
data  base  obtained  by  the  NMM.  For  example,  when  all  of  the 
usage  of  a  given  network  during  a  period  of  time  such  as  a 
month  is  considered,  the  set  encompasses  multiple 
conversations . 
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In  the  present  context  the  interest  is  in  sets  which 
encompass  less  than  a  conversation;  that  is  to  say  subsets 
of  a  conversation.  Two  criteria  have  been  developed  for  the 
identification  of  subsets.  One  criterion  is  the 
meaningfulness  of  the  information  obtained  from  the  message 
group.  Data  screening  techniques  may  be  applied  to  the 
message  groups  to  identify  . those  falling  outside  of  a 
standard  sampling  interval.  From  an  entirely  different 
point  of  view,  message  groups  may  be  identified  according  to 
the  functional  objective  with  which  they  are  associated. 
The  use  of  ah  identifiable  software  resource  or  service  is  a 
criterion  for  membership  of  a  message  group  in  an  identified 
subset.  For  example,  in  a  programming  environment  the  use 
of  the  various  language  translators,  the  editor,  the  linking 
loader,  and  the  execution  of  debugging  tools  each  constitute 
a  subset.  There  is  no  requirement  that  the  subset 
definitions  be  mutually  exclusive;  it  is  therefore  possible 
that  an  individual  message  group  may  satisfy  the  definition 
of,  and  therefore  belong  to,  multiple  subsets.  In  the 
present  implementation,  the  pointer  to  each  message  group  in 
the  index  file  also  contains  an  identification  of  subsets  of 
which  that  message  group  is  a  member. 

Subset  identification  makes  it  possible  to  take  various 
samplings  or  cross  sections  through  the  data  base,  depending 
upon  the  objective  of  the  analysis.  Using  the  editor  as  an 
example,  it  is  possible  to  identify  what  percentage  of  the 
message  groups  or  the  elapsed  time  is  spent  in  the  use  of 
this  resource.  It  is  also  possible  to  limit  the  attention 
to  this  editing  resource  and  to  perform  all  of  our 
statistical  analyses  on  it.  The  user  demands  and  network 
responses  associated  with  the  use  of  the  editor  are 
interesting  in  themselves  and  may  be  substantially  different 
from  the  network  assemblage  statistics. 

6.1  Subset  Assignment 

The  resources  and  services  for  a  given  computer  network 
are,  in  general,  fixed  and  well  defined.  They  are  usually 
enumerated  in  user  literature  provided  by  the  network 
operators.  It  is,  therefore,  straightforward  to  identify 
the  subsets  describing  usage  of  a  particular  network.  The 
process  of  identifying  the  subset  to  which  a  message  group 
belongs  begins  with  the . identification  of  the  computer  and 
network  involved  in  the  conversation.  Implicit  in  the 
identification  is  the  selection  of  the  possible  subsets,  the 
union  of  which  describes  the  utilization.  For  the 
convenience  of  the  analyst,  the  subsets  are  identified  by 
name  whenever  analyst  interaction  is  possible.  Of  course, 
internal  identification  by  name  would  be  inefficient,   so   a 
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coded  mask  is  employed  in  the  message  group  header  to 
identify  the  subset  or  subsets  to  which  each  individual 
message  group  belongs. 

6.2  Assignment  of  Message'  Groups  to  Subsets 

Automation  of  assigning  a  message  group  to  one  or  more 
subsets  is  accomplished  by  a  specially  written  program.  The 
program  is  strictly  dependent  on  the  network  and  network 
computer  being  analyzed.  This  program  must  incorporate 
recognition  of  the  software  services  available,  the  syntax 
of  the  command  language,  and  the  local  editing  conventions. 
At  this  time  one  such  special  program,  called  TT1108,  has 
been  implemented  to  identify  subsets  used  on  the  Univac  1108 
operating  under  Exec  8.  The  program  employs  a  sliding 
character  string  match  technique,  similar  to  that  used  to 
detect  the  end  of  a  conversation  and  the  end  of  an 
acknowledgement,  to  identify  the  subset  to  which  each 
message  group  belongs. 

After  subsets  have  been  identified  by  this  program,  it 
may  be  necessary  to  make  changes  to  account  for  anomalies  or 
to  correct  errors.  These  changes  must  be  performed  manually 
by  the  analyst.  The  analyst  works  with  a  list  of  the 
conversation  which  was  previously  prepared  and  interactively 
assigns  one  or  more  subsets  to  each  message  group  through 
the  program  DSINFO. 

6.3  Analysis  of  Data  by  Subsets 

The  statistical  routines  described  earlier 
histograms,  statistics,  summary  —  are  available  for 
operation  on  the  subsets.  They  may  be  applied  to  individual 
subsets  or  to  logical  combinations  of  subsets.  Therefore, 
it  is  possible,  for  instance,  to  analyze  the  response  delay 
time  experienced  in  invoking  an  editor  at  various  points  in 
a  conversation. 

Although  the  analysis  of  subsets  within  a  given 
conversation  may  prove  sufficient  to  some  users,  analysis  of 
corresponding  subsets  within  a  variety  of  conversations  is 
necessary  for  others.  For  example,  statistics  may  be 
obtained  on  network  performance  during  a  specified  time 
period  over  a  number  of  days  to  verify  the  time  dependent 
variation  in  network  utilization.  This  analysis  of  multiple 
.SUM  files  is  done  with  the  previously  described  program 
MULT.  To  assure  compatibility  in  the  analysis  of  .SUM  files 
such  information  as  subset  definitions  is  maintained  in 
headers  to  the  .SUM  files. 
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7.   Applications  Areas 

The  NMS  can  be  used  in  the  design,  selection,  and 
improvement  of  computer  networks  and  network  services.  This 
section  specifies  how  the  NMS  can  help  Federal  agencies  in 
these  three  network  related  areas.  This  subject  is 
discussed  in  detail  by  Abrams,  Watkins  and  Cotton  [1975]. 

7 . 1  Design 

In  the  design  of  a  computer  network  the  measurement  of 
external  performance  is  useful  in  specifying  service  design 
goals.  The  NMS  can  be  used  by  designers  for  such 
specification  and  later  for  the  testing  of  the  design  to 
determine  if  design  goals  are  being  met;  that  is,  that  the 
service  specified  is  being  provided.  This  procedure  also 
provides  a  common  ground  for  network  designers  and  those  for 
whom  they  are  designing  the  network.  It  is  often  difficult 
for  agencies  to  explicitly  state  the  service  they  expect 
from  a  proposed  network.  By  applying  the  NMS  to  known 
networks  the  analyzed  data  provides  agencies  with  a  basis 
for  service  specification. 

7.2  Selection 

The  selection  of  computer  networks  and  network  services 
provides  three  areas  of  application  of  the  NMS.  The  NMS  can 
be  used  to  determine  user  requirements.  When  the  NMM  is 
applied  to  a  current  network,  the  analysis  of  the  data 
acquired  instructs  agencies  as  to  current  service 
capabilities.  The  agencies  can  then  specify  .their 
requirements  with  current  service  levels  as  a  comparison 
base  for  determining  the  minimum  service  level  required. 

Once  user  requirements  are  determined,  a  selection 
among  networks  must  be  made.  The  NMM  is  applied  to  each 
candidate  network  during  execution  of  a  specified  benchmark. 
For  a  computer  evaluation  of  network  performance,  it  is 
recommended  that  this  procedure  be  conducted  during  both 
high  and  low  network  load  periods.  A  variety  of  aspects  of 
the  network  can  be  analyzed:  degradation  during  high  load 
periods,  utilization  of  the  communication  lines,  the  time 
the  network  requires  to  process  stimuli,  and  the  time  needed 
to  output  responses. 

The  third  application  occurs  after  selection  and  is 
especially  pertinent  to  the  selection  of  network  services. 
The  NMS  is  periodically  applied  to  a  network  service  during 
the  execution  of  the  same  benchmark  used  in  the  selection 
process.   The  analysis  of  acquired   data   will   assure   that 
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service  is.  not  degrading  below  an  acceptable  level. 

7.3  Improvement 

The  NMS  can  be  used  to  measure  the  effect  on  service 
caused  by  a  change  within  the  network.  For  this 
application,  the  NMM  should  be -applied  before  and  after  the 
change  is  made. 

The  NMS  can  also  be  used  relative  to  the  operation  of 
computer  networks  by  characterizing  user  habits  and  the 
communications  characteristics  of  the  network.  The  data 
will  show  which  facilities  of  the  network  are  most  heavily 
used,  thus  indicating  which  optimization  efforts  should 
receive  priority.  Efficient  multiplexing  and  concentrating 
in  a  network  are  key  factors;  the  analysis  of  data  will 
indicate  such  statistics  as  the  users'  average  transmission 
rate  which  will  help  in  ,  the  design  of  multiplexors  and 
concentrators  for  specific  networks. 

The  NMS  data  can  be  used  in  determining  the  overhead 
introduced  by  the  communications  portion  of  a  network.  When 
the  network  charging  algorithm  incorporates  the  number  of 
characters  transmitted,  the  overhead  of  nonprinting 
characters  (which  do  not  convey  useable  information  to  the 
user)  becomes  more  significant.  Another  area  of 
investigation  is  the  transmission  rate  of  the  communications 
facility.  Arguments  concerning  the  desirability  of 
asymmetrical  transmission  rates,  with  a  high  transmission 
rate  from  computer  network  to  user,  may  find  support  in 
these  statistics.  By  providing  such  data  as  line 
utilization,  the  NMS-produced  statistics  can  indicate  to 
Federal  agencies  areas  of  inefficient  network  functioning. 

8.   Summary 

This  report  has  described  the  Network  Measurement 
System  (NMS)  with  particular  attention  to  the  interpretation 
of  data  by  the  Data  Analysis  Package  (DAP)  component.  The 
processing  necessary  to  produce  meaningful  service 
measurements  from  the  data  acquired  by  the  Network 
Measurement  Machine  (NMM)  has  been  explained.  Various 
processing  by  the  DAP  —  including  format  reconfiguration, 
structuring,  and  statistical  treatment  —  has  been 
discussed. 
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The  intent  in  the  design  of  the  DAP  was  to  present  the 
NMM  data  in  such  a  form  as  to  permit  the  most  flexible  use. 
The  index  structure  allows  selective  access  to  any  portion 
of  the  half-duplex  representation  of  the  conversational 
data.  The  subset  capability  enables  the  user  of  the  DAP  to 
concentrate  analysis  on  one  specific  network  function  or  set 
of  functions,  even  across  conversation  boundaries. 

The  statistics  calculated  by  the  current  DAP  should  be 
regarded  as  typical  and  may  be  replaced  or  supplemented  by 
others  desired  by  users  of  the  DAP.  Among  the  statistics 
calculated,  the  median  and  90th  and  95th  percentiles  may  be 
more  informative  in  the  comparison  of  two  networks  or 
network  services  due  to  the  non-normal  distribution  of  the 
data . 

The  NMS  was  designed  to  provide  a  quantitative  basis 
for  the  evaluation  of  a  computer  network  and  computer 
network  service  for  the  purposes  of  improvement  and 
selection.  It  is  hoped  that  this  work  will  prove  useful  to 
Federal  agencies  operating  their  own  networks  or  in  the 
process  of  selecting  a  network  or  network  service. 
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